We just saw that the main challenge in Kruskal's is efficiently detecting if adding an edge creates a cycle. The most effective way to solve this is with a specialized data structure called the Disjoint Set Union (DSU), also known as a Union-Find data structure.

  • A DSU tracks a collection of disjoint (non-overlapping) sets and provides two core operations:
    • find(item): Determines which set an item belongs to by finding the "representative" or root of that set.
    • union(item1, item2): Merges the two sets containing item1 and item2.
  • In Kruskal's, we treat each vertex as its own set initially. For each edge (u, v) we consider:
    • We check if find(u) is the same as find(v).
    • If they are the same, u and v are already in the same component. Adding the edge would create a cycle, so we discard it.
    • If they are different, we add the edge to our MST and perform a union(u, v) to merge their components.
  • The total time complexity for Kruskal's is dominated by the initial edge sort, making it $O(E \log E)$.
  • The DSU operations, with optimizations, are so fast they are considered nearly constant time on average.
Python: DSU with Path Compression & Union by Size
1class DSU:
2    def __init__(self, n):
3        # parent[i] < 0 means i is a root, and -parent[i] is its size
4        self.parent = [-1] * n
5
6    def find(self, i):
7        if self.parent[i] < 0:
8            return i
9        # Path compression: set parent directly to the root
10        self.parent[i] = self.find(self.parent[i])
11        return self.parent[i]
12
13    def union(self, i, j):
14        root_i = self.find(i)
15        root_j = self.find(j)
16        if root_i != root_j:
17            # Union by size: attach smaller tree to root of larger tree
18            if self.parent[root_i] < self.parent[root_j]:
19                self.parent[root_i] += self.parent[root_j]
20                self.parent[root_j] = root_i
21            else:
22                self.parent[root_j] += self.parent[root_i]
23                self.parent[root_i] = root_j
24            return True
25        return False
26
27if __name__ == '__main__':
28    # Vertices: 0=A, 1=B, 2=C, 3=D
29    dsu = DSU(4)
30    print(f"Initial parent array: {dsu.parent}")
31    dsu.union(0, 1) # Union A and B
32    print(f"After union(A, B):   {dsu.parent}")
33    dsu.union(2, 3) # Union C and D
34    print(f"After union(C, D):   {dsu.parent}")
35    dsu.union(0, 2) # Union sets {A,B} and {C,D}
36    print(f"After union(A, C):   {dsu.parent}")